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Abstract — 

We  propose  a  new  technique  for  reducing  the  effect  of  noise  on  con¬ 
trast  fusion  of  multimodal  imagery,  yielding  higher  quality  results  than 
can  be  obtained  with  previous  methods.  We  rely  on  local  non-parametric 
estimation  of  band  gradient  entropies  as  a  relative  measure  of  geometric 
structure.  This  work  builds  upon  previous  work  by  the  author,  and  is 
exemplified  with  remote  sensing  applications.  Quantitative  measures  of 
performance  are  given  on  ground-truth  data,  which  indicate  the  advan¬ 
tages  of  the  new  technique  over  existing  approaches. 

Keywords —  Multispectral  image  processing,  image  fusion,  remote 
sensing,  adaptive  noise  estimation,  entropy  estimation. 


I.  Introduction 

Increasing  availability  of  co-registered  multimodal  imagery 
from  diverse  sources  such  as  magnetic  resonance  scanners, 
aerial  and  earth  orbiting  sensors  has  spurred  development  of 
numerous  techniques  for  image  fusion  and  visualization  [1], 
[2],  [3],  [4],  [5],  [6],  [7],  [8],  [9],  [10],  [11],  [12].  These 
methods  can  be  roughly  categorized  in  two  classes,  those 
which  work  on  the  zero  order  properties  of  the  images,  and 
those  that  use  higher  order  information  such  as  first  or  sec¬ 
ond  derivatives.  We  will  refer  to  the  later  as  contrast  fusion 
methods,  since  they  attempt  to  reproduce  a  combination  of  lo¬ 
cal  contrast  from  each  modality  in  the  fused  image  product. 
These  methods  appear  to  hold  the  greatest  promise,  despite 
their  computational  burden,  since  they  are  well-adapted  to  the 
physiological  basis  of  contrast  perception  in  the  low-level  hu¬ 
man  visual  system  [13],  [14]. 

Contrast-based  fusion  techniques  typically  rely  on  the  abso¬ 
lute  magnitude  of  derivatives  as  a  means  of  determining  which 
bands  of  a  multimodal  image  dominate  the  result  at  a  given 
point.  A  common  rule  is  to  choose  the  maximum  contrast 
among  the  bands  and  prescribe  that  as  the  contrast  for  the 
fused  image,  with  the  rationale  that  large  contrast  correlates 
with  visually  relevant  image  features.  While  this  assumption 
may  hold  for  noise-free  images,  it  becomes  false  when  one  or 
several  bands  are  corrupted  by  noise.  In  that  situation,  large 
contrast  may  correspond  to  variation  due  to  noise,  and  not  to 
underlying  image  features.  Thus  any  fusion  rule  that  assumes 
all  contrast  to  be  due  to  valid  signal  variation  will  inevitably 
produce  unsatisfactory  results  in  the  presence  of  noise.  The 
present  article  proposes  a  new  technique  for  minimizing  the 
adverse  effects  of  noise  on  contrast  fusion  algorithms,  thereby 
rendering  them  more  widely  applicable.  Most  or  all  previous 
contrast  fusion  algorithms  could  be  adapted  to  take  advantage 
of  this  new  technique.  However,  the  algorithms  previously  de¬ 
veloped  by  the  author  and  his  co-workers  are  especially  well- 
suited  to  incorporate  the  ideas  introduced  in  this  paper,  and 
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Fig.  1 .  Planar  projections  of  the  sensor  noise  equal  probability  ellipses,  com¬ 
puted  for  a  Cohu  2200  color  camera,  in  RGB  coordinates. 


therefore  these  algorithms  are  used  as  a  model. 

The  outline  of  this  article  is  as  follows.  In  section  II  we 
briefly  review  the  contrast  fusion  algorithm  proposed  in  [5], 
[13].  Section  III  describes  a  technique  for  including  informa¬ 
tion  about  the  sensor  noise  characteristics  into  the  fusion  pro¬ 
cess.  A  locally  adaptive  merging  rule  that  takes  into  account 
absolute  contrast  magnitude  and  image  noise  is  introduced  in 
section  IV.  Section  V  shows  experimental  results  obtained  on 
various  images.  Finally  section  VI  provides  some  conclusions 
and  outlines  directions  for  future  improvement. 

II.  Contrast  fusion  model 

The  method  introduced  below  applies  to  several  contrast 
fusion  models,  such  as  those  based  on  Laplacian  pyramids 
and  wavelet  transforms.  For  simplicity  we  introduce  it  in  the 
context  of  the  model  in  [5]  and  the  extensions  in  [13],  which 
we  review  in  this  section. 

Fet  U  C  M2  denote  the  image  domain  (usually  a  rect¬ 
angle),  and  consider  the  trivial  Riemannian  vector  bundle 
P  =  (U,U  x  Mn,g),  where  D  is  the  base  space,  DxRn 
is  the  total  space,  g  is  a  fiber-wise  constant  metric,  and  the 
projection  map  is  simply  i t(x,  y ,  v)  =  (x,  y),  for  (x,  y)  £  D, 

v  £  Mn.  Now,  define  an  n-band  image  as  a  section  of  V, 

s 

that  is  a  map  of  the  form  D  3  (x,  y)  t-4  (#,  y ,  s(x,  y)),  with 
s  :  Q  — »  Mn.  One  may  then  write  S  =  Id  0  s,  where  Id 
denotes  the  identity  map  on  M2.  The  corresponding  contrast 
form  is  defined  as  x  =  S*h  —  5,  where  S  is  the  Euclidean 
metric  on  M2,  and  S*  denotes  the  pullback  map  induced  on 
differential  forms  on  Mn  by  S.  This  is  a  bilinear  form,  which 
is  usually  degenerate  since  8  may  have  vanishing  differential. 
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In  coordinates,  the  contrast  form  is  given  by 


Xijip)  =  9ki(S(p )) 

k,l=l 


dsk(p )  ds\p ) 


U  5 


for  1  <  i,  j  <  2  andp  G  U. 

The  contrast  vector  field  V  G  TO,  is  defined  so  that  its  mag¬ 
nitude  and  direction  correspond  with  the  largest  eigenvalue  of 
X  and  its  eigenspace,  and  so  that  its  orientation  agrees  with 
that  of  the  gradient  of  an  auxiliary  function  :  Q  —>  M, 
usually  chosen  to  be  sk  (but  see  H3]  f°r  other 

choices).  It  is  straightforward  to  verify  that  if  n  =  1  and  the 
metric  on  the  bundle  P  is  globally  Euclidean,  then  V  =  Vs, 
hence  the  contrast  vector  field  reduces  to  the  standard  notion 
of  first  order  contrast  for  grayscale  images.  In  the  case  n  >  1, 
the  contrast  vector  field  encodes  the  combination  of  contrast 
in  all  bands  of  the  image  S.  Therefore,  a  grayscale  image, 
given  as  an  intensity  map  /  :  Q  — >  R,  whose  gradient  equals 
V  is  a  fused  realization  of  the  contrast  in  all  bands  of  S.  Note 
that  the  bundle  metric  g  controls  how  contrast  from  each  band 
contributes  to  the  contrast  of  the  fused  image,  and  furthermore 
how  contrast  interacts  among  bands.  If  g  is  chosen  to  be  Eu¬ 
clidean  on  each  fiber,  then  all  bands  contribute  equally  and  do 
not  interact  with  each  other  non-linearly. 

It  is  normally  not  possible  to  solve  the  equation  V/  =  V  as 
a  means  of  constructing  the  fused  image,  so  we  resort  to  find¬ 
ing  the  best  L2 -approximate  solution,  which  is  characterized 
(up  to  a  constant)  as  a  solution  of  the  Neumann  problem 


A /  =  div  V,  on  ft, 
V/  •  n  =  0,  on  dtd. 


This  is  the  desired  fusion  of  the  multiple  bands  in  the  original 
image.  For  connections  with  wavelet-based  fusion  algorithms 
and  L1  -approximate  solutions  see  [13].  There  are  many  ef¬ 
ficient  numerical  methods  for  solving  this  equation,  and  we 
will  not  discuss  this  topic  in  the  current  article  (see  [5]  and  the 
references  therein). 


III.  Sensor  dependent  metric 

Wolff  and  Socolinsky  [15]  proposed  a  method  for  handling 
sensor  noise  in  the  context  of  edge  detection  in  multispectral 
images,  which  can  also  be  used  to  account  for  sensor  charac¬ 
teristics  within  the  fusion  model  of  section  II.  Their  approach 
is  based  on  a  generalization  of  the  MacAdams  color  discrimi¬ 
nation  metric  of  the  human  eye  [16],  to  artificial  camera  sen¬ 
sors,  and  models  the  inherent  noise  properties  of  the  sensor 
as  a  function  of  position  in  photometric  space.  Let  us  explain 
the  idea  for  a  standard  3-color  camera.  Let  X  =  (#i,  #2>  #3) 
be  the  color  coordinates,  in  some  fixed  coordinate  system,  of 
an  incoming  light  stimulus.  The  camera  response  C(X )  for 
a  stimulus  of  color  A  is  a  random  variable,  which  they  as¬ 
sume  to  be  Gaussianly  distributed  with  mean  X  and  covari¬ 
ance  £(X).  The  parameter  £(X)  measures  the  reliability 
of  the  sensor  response  for  stimuli  with  color  coordinates  X. 
Thus,  it  is  natural  to  define  a  global  metric  on  photometric 
space,  independent  of  spatial  coordinates,  by 

5(X)  =  S-1(X). 

Using  this  metric,  contrast  from  each  band  is  weighted  ac¬ 
cording  to  the  noise  characteristics  of  the  sensor  for  each  par¬ 
ticular  color,  whereby  noisier  bands  will  contribute  less  to  the 


contrast  form  than  bands  with  lower  variance,  in  which  we 
have  higher  confidence.  Figure  1  shows  planar  projections  of 
the  equi-probability  ellipses  computed  for  a  Cohu  2200  color 
camera,  represented  in  RGB  coordinates.  The  main  problem 
with  this  approach  is  that  in  order  to  compute  the  metric  we 
must  have  access  to  the  sensor  and  a  rather  substantial  exper¬ 
iment  must  be  carried  out.  Even  then,  noise  properties  of  the 
sensor  may  be  dependent  on  many  changing  factors,  such  as 
environmental  conditions,  and  these  should  be  taken  into  ac¬ 
count.  Consequently,  this  construction  may  be  of  theoretical, 
more  so  than  computational,  interest. 

IV.  Locally  adaptive  metric 

Not  all  image  noise  can  be  accounted  for  by  the  sensor 
dependent  metric  from  the  previous  section.  Furthermore, 
we  often  have  no  access  to  the  sensor  or  cannot  account  for 
all  variables  which  figure  into  the  noise  metric.  Under  such 
circumstances,  it  may  be  advantageous  to  choose  the  metric 
adaptively,  so  as  to  maximize  some  information  criterion.  It 
is  not  possible  to  separate  signal  from  noise  within  a  single 
band,  but  having  multiple  bands  we  may  hope  to  determine  to 
what  degree  contrast  from  one  band  is  more  ‘structured’  than 
that  from  another,  and  then  use  this  knowledge  to  decide  on 
an  appropriate  weighting. 

It  is  tempting  to  consider  the  local  mutual  information  be¬ 
tween  the  gradients  of  different  bands  as  a  measure  of  con¬ 
sistency.  However,  it  is  easy  to  see  that  mutual  information 
does  not  give  us  the  kind  of  measure  we  are  looking  for.  In 
fact,  recall  that  for  independent  random  variables,  the  mutual 
information  is  zero,  but  the  distribution  of  gradients  in  differ¬ 
ent  bands  may  be  uncorrelated  for  different  reasons.  Consider 
for  example  a  three-band  image  set  consisting  of  a  thermal- 
infrared  band,  a  range  scanner  band  and  an  intensified  near- 
infrared  band,  acquired  under  extremely  low  light  conditions. 
In  this  situation,  the  intensified  band  will  be  very  noisy,  while 
the  other  two  bands  will  not  be  adversely  affected  by  the  il¬ 
lumination.  Now  certain  pixels  will  show  strong  edges  in  the 
range  band,  due  to  depth  discontinuities  without  temperature 
change,  and  at  other  pixels  the  reverse  will  be  true  due  to  ther¬ 
mal  variations  with  no  change  in  depth.  For  a  neighborhood 
of  these  pixels  the  mutual  information  between  the  respective 
gradients  may  be  low,  but  this  is  exactly  the  kind  of  situation 
where  we  want  to  take  into  account  the  input  from  both  bands 
(or  at  least  the  one  with  strongest  gradients).  On  the  other 
hand,  at  certain  image  pixels  we  will  have  strong  gradients  in 
the  intensified  near-IR  band  due  entirely  to  noise  and  weaker 
but  relevant  edges  in  another  band  due  to  the  true  signal.  Here 
the  mutual  information  will  still  be  low,  but  both  bands  should 
not  contribute  equally  to  the  mixture.  Furthermore,  the  band 
with  largest  gradients  should  not  be  weighted  more  heavily  as 
it  is  noisier. 

More  generally,  one  cannot  appeal  to  methods  which  ex¬ 
ploit  any  sort  of  consistency  or  correlation  across  bands,  since 
in  the  most  interesting  image  fusion  scenarios  that  correla¬ 
tion  may  be  null.  Indeed,  one  of  the  most  profitable  applica¬ 
tions  of  image  fusion  are  those  in  which  different  image  bands 
carry  complementary  and  uncorrelated  information.  In  these 
cases  several  bands  are  necessary  to  get  a  complete  informa¬ 
tive  picture.  What  we  would  like  to  do  is  combine  the  band 
gradients  so  that  the  image  obtained  from  the  resulting  con¬ 
trast  form  takes  optimal  advantage  of  each  band,  emphasiz¬ 
ing  structured  bands  and  de-emphasizing  noisy  ones.  We  now 


propose  a  method  which  yields  good  results,  but  has  no  prov- 
ably  optimal  properties.  The  optimal  choice  of  metric,  if  such 
an  object  can  in  fact  be  characterized,  is  an  open  problem. 

Entropy-based  techniques  have  been  used  in  image  process¬ 
ing  for  tasks  such  as  reconstruction  from  incomplete  or  noisy 
data  [17]  and  removal  of  motion  artifacts  in  MRI  [18].  Let 
S  =  Id  0  s,  with  s  :  £7  — »  Mn  be  an  n-band  image.  For  each 
band  sk,  k  =  1 ...  n,  of  5  and  each  pixel  (x,y)  G  Q,  we  wish 
to  consider  the  entropy  of  the  random  vector  X7sk  in  a  discrete 
pixel  neighborhood  of  (x,y). 

For  simplicity,  consider  a  square  pixel  neighborhood  of  odd 
side  length  w  ,  centered  at  the  pixel  (x,y)  G  Q,  and  de¬ 
note  it  by  Uw(x,  y).  Circular  neighborhoods  can  also  be  used 
with  similar  results  and  more  computational  effort.  We  esti¬ 
mate  the  local  probability  density  of  S7sk  over  Uw(x,  y )  non- 
parametrically  by 


Q{v)  = 


l 

w2  X 


peuw(x,y ) 


v  e  K2, 


which  gives  us  an  estimate  for  the  local  entropy 

Ht(x,y)  =  ~[  Q  logQ, 

J  R2 

where,  as  is  customary,  the  logarithm  above  is  taken  to  be 
zero  if  Q  vanishes,  K  is  a  smoothing  kernel  (Gaussian  in  our 
numerical  implementation),  and  A  is  a  fixed  bandwidth.  It 
follows  from  the  definition  of  entropy  that  0  <  Hw(x,y)  < 
2  log  w.  From  this  computation,  we  obtain  n  maps 

Rkw-.  — ►  [0,  »o2] ,  k  =  l...n 


defined  by 


H*(x,y)  =  e^oSw-Ht(x,y)  _ 

Now,  we  define  a  spatially  varying,  locally  adaptive,  diagonal 
metric  on  the  trivial  bundle  x  Mn  by 


the  neighborhood  is  used  to  compute  the  gradients,  and  thus 
different  pixel  arrangements  yield  different  gradient  distribu¬ 
tions. 

Figure  2  shows  an  example  of  the  distribution  of  gradients 
for  a  noisy  neighborhood  versus  a  structured  one,  as  pictured 
in  figure  2  (left).  One  can  easily  see  that  the  distribution  of 
gradients  for  the  unstructured  neighborhood  is  rather  uniform. 
The  geometric  structure  of  the  second  example  neighborhood 
is  apparent  from  the  tight  clustering  of  gradient  vectors  into 
two  groups,  those  corresponding  to  the  vertical  edge  through 
the  middle  of  the  neighborhood,  and  the  rest.  The  metric  in  (1) 
exploits  this  fact  by  favoring  bands  with  lower  local  entropy. 
If  all  bands  have  approximately  the  same  gradient  entropy  in  a 
neighborhood  of  a  given  pixel,  then  no  decision  can  be  made 
and  the  metric  defaults  to  Euclidean  at  that  point.  The  free 
parameters  in  the  metric  are  the  size  of  the  neighborhood  over 
which  we  estimate  the  local  entropy,  and  the  bandwidth  of  the 
kernel  estimator,  both  of  which  are  scale  parameters.  We  use 
a  fixed  neighborhood  width  at  every  pixel  of  every  band.  It  is 
very  possible  that  an  adaptive  width  entropy  estimation  could 
yield  better  results,  but  we  have  been  unable  to  find  an  effec¬ 
tive  adaptation  rule,  since  that  rule  should  somehow  depend 
on  the  noise  properties  of  the  image  bands  in  that  neighbor¬ 
hood  and  that  is  precisely  what  we  are  trying  to  estimate  in 
the  first  place. 

The  entropy  metric  in  (1)  provides  a  pointwise  measure  of 
the  confidence  we  have  in  the  contrast  from  a  given  band  at 
that  point  being  due  to  a  structured  signal  and  not  random 
noise.  Since  only  the  information  in  the  image  being  pro¬ 
cessed  is  used,  we  can  only  provide  a  relative  measure,  where 
the  confidence  for  one  band  is  relative  to  the  properties  of 
the  other  bands  in  the  same  neighborhood.  This  construc¬ 
tion  effectively  implements  Ockham’s  razor,  favoring  the  sim¬ 
plest  alternative.  Note  that  this  is  independent  of  the  contrast 
present  in  each  band  at  a  given  pixel.  Individual  band  con¬ 
trasts  still  compound  to  create  the  contrast  vector  field  via  the 
contrast  form  as  in  section  II,  but  their  relative  weighting  is 
given  by  our  entropy  metric. 


{nH^(x,y)  -r  • 

#*(*.*)  J’  (i) 

0  otherwise 

It  behooves  us  now  to  provide  some  intuition  for  the  na¬ 
ture  of  the  entropy- weighted  metric  (1).  Consider  the  differ¬ 
ence  between  an  image  neighborhood  with  geometric  struc¬ 
ture  and  one  which  is  dominated  by  noise.  In  the  structured 
case,  we  expect  the  gradients  to  be  concentrated  in  a  few  di¬ 
rections,  corresponding  to  edge  orientations,  and  magnitudes, 
corresponding  to  edge  points  versus  non-edge  points.  For  a 
noisy  neighborhood,  we  should  expect  the  gradients  to  be  dis¬ 
tributed  randomly,  spread  over  a  large  range  of  magnitudes 
and  orientations.  Therefore,  the  entropy  of  the  gradient,  as  a 
two-dimensional  random  variable,  in  an  image  neighborhood 
can  be  expected  to  be  lower  for  well-structured  neighborhoods 
than  for  ones  dominated  by  noise.  Note  that  we  cannot  use  the 
entropy  of  the  raw  grayvalues  from  each  band  as  a  weighting 
criterion.  In  fact,  the  local  entropy  of  the  grayvalues  is  invari¬ 
ant  under  spatial  rearrangement  of  the  pixels  in  the  estimation 
neighborhood,  and  thus  does  not  convey  information  about 
the  geometric  structure  of  the  image  over  that  neighborhood. 
On  the  other  hand,  local  gradient  entropy  is  not  invariant  un¬ 
der  rearrangement,  since  the  spatial  location  of  pixels  within 


Fig.  2.  Example  gradient  distributions  for  noisy  (green)  and  structured  (red) 
neighborhoods  at  left. 


V.  Experimental  Results 

Let  us  first  look  at  an  example  from  remote  sensing.  Figure 
3  shows  three  bands  of  a  multispectral  aerial  image  acquired 
over  Hawaii  with  dynamic  range  [0,  255],  and  a  fourth  artifi¬ 
cial  band  composed  of  Gaussian  distributed  noise  with  mean 
127  and  standard  deviation  of  40.  If  we  fuse  these  four  bands 
with  the  method  in  section  II  using  a  Euclidean  metric,  we 
obtain  the  result  in  Figure  4(a).  Note  how  all  the  noise  in  the 
artificial  band  has  been  transferred  to  the  fused  image,  as  it 


Fig.  3.  Three  bands  of  a  multispectral  aerial  image  acquired  over  Hawaii  (courtesy  of  Space  Imaging  Corporation),  plus  an  artificial  band  of  Gaussian  noise. 


was  equally  weighted  with  respect  to  the  other  three  bands. 
Any  other  contrast  fusion  method  that  does  not  explicitly  take 
into  account  noise  will  produce  comparable  results.  If  instead 
of  a  Euclidean  metric,  we  use  the  one  constructed  in  (1),  then 
we  obtain  the  fused  image  in  Figure  4(b),  in  which  the  noisy 
band  has  been  automatically  de-emphasized.  A  side-by-side 
comparison  of  subregions  of  the  images  in  4(a)  and  4(b)  is 
provided  in  Figure  5.  It  is  evident  that  much  less  noise  has 
been  transferred  to  the  fused  image  through  the  use  of  the 
entropy- weighted  metric.  In  fact,  much  detail  is  destroyed 
by  the  effect  of  noise  in  the  Euclidean  fusion,  especially  in 
areas  with  weak  edges  in  the  clean  bands  which  nonetheless 
represent  true  structure.  This  is  especially  visible  in  the  re¬ 
gion  obscured  by  cloud  shadows  magnified  at  the  bottom  of 
figure  5.  The  fusion  with  entropy-weighted  metric  success¬ 
fully  ignores  contrast  due  to  noise  in  the  artificial  band,  when 
more  structured  contrast  is  present  in  the  other  bands.  From 
a  quantitative  point  of  view,  the  mean  signal-to-noise  ratio  for 
the  Euclidean  fusion  is  40.47dB  and  for  the  entropy- weighted 
fusion  it  is  58.12dB,  an  improvement  of  over  43%. 

To  gain  some  more  intuition,  we  can  look  at  the  compo¬ 
nents  of  the  entropy-weighted  metric  for  the  example  image 
in  figure  3.  Figure  6  shows  a  plot  of  the  diagonal  metric  coef¬ 
ficients  along  a  horizontal  scan-line  three-quarters  of  the  way 
down  the  image  in  figure  3,  along  with  a  comparison  of  the  av¬ 
erage  coefficient  of  the  first  through  third  bands  versus  the  co¬ 
efficient  of  the  fourth  (noise)  band.  We  see  that  whenever  the 
first  three  bands  show  more  structure  at  the  scale  we  use  for 
estimating  the  local  entropy,  the  are  favored  heavily  over  the 
noisy  band.  On  the  right  side  of  the  plot,  which  corresponds 
to  the  heterogeneous  region  of  tightly  packed  houses  on  the 
lower  right  side  of  the  image,  the  weight  on  the  noise  band 
becomes  comparable  to  the  other  three  weights,  as  the  local 
entropy  estimation  cannot  differentiate  structure  from  noise  at 
the  given  scale. 

A  fair  measure  of  performance  can  be  obtained  from  a  sim¬ 
ple  artificial  experiment,  similar  to  the  example  above.  We 
took  a  noise-free  image  (the  standard  Fena  test  image),  cre¬ 
ated  a  second  band  of  random  noise  with  the  same  mean  and 
standard  deviation.  Then,  we  computed  the  fusion  of  these 
two  bands  using  the  algorithm  in  section  II,  both  with  a  Eu¬ 
clidean  metric  and  the  locally-adaptive  metric  in  section  IV, 
as  well  as  with  the  wavelet  method  in  [8],  using  a  Haar  basis. 
The  best  possible  result  is  obviously  to  recover  the  original 
noise-free  image,  with  no  corruption  from  the  artificial  noise 
band.  Measuring  the  departure  from  the  optimal  result  with 
the  mean  squared  signal-to-noise  ratio,  we  obtain  the  results 
in  table  I,  corresponding  to  the  images  in  figure  7.  One  can 
easily  see  that  the  local-entropy-based  metric  is  superior  to  the 
Euclidean  one,  and  both  outperform  the  (Euclidean)  wavelet 


Fig.  6.  Top:  Diagonal  components  of  the  entropy- weighted  metric  along  a 
horizontal  scan-line  of  the  image  in  figure  3.  Bottom:  Average  metric 
coefficient  on  clean  bands  versus  coefficient  on  noise  band. 


fusion  in  this  example.  The  difference  between  the  Euclidean 
fusion  from  section  II  and  the  wavelet  result  should  not  nec¬ 
essarily  be  interpreted  as  indication  that  one  method  is  supe¬ 
rior  to  the  other  for  general  2-band  images.  In  this  example, 
the  mean  value  of  the  metric  component  corresponding  to  the 
Lena  image  is  0.83,  while  the  mean  metric  component  on  the 
noise  band  is  0.17.  Not  surprisingly,  referring  to  table  I  we 
see  that  0.83/0.17  «  97.47/20.10,  which  indicates  that  the 
capacity  of  the  metric  to  distinguish  structure  from  noise  is 
almost  directly  transferred  to  the  fusion  algorithm. 


Euclidean 

Local  Entropy 

Contrast  Form 

20.10 

97.47 

Haar  Wavelet 

9.40 

N/A 

TABLE  I 

Mean  squared  signal-to-noise  ratios  for  reconstruction  of 
Lena  image  with  different  methods  and  metrics. 


In  order  to  test  the  sensitivity  of  the  local  entropy  estima¬ 
tion  to  different  degrees  of  noise,  we  conducted  the  following 
experiment.  The  standard  Lena  image  (upper  left  on  figure  7) 
was  corrupted  with  additive  Gaussian  noise  of  different  stan¬ 
dard  deviations,  on  a  grid  pattern  over  the  image  and  again  on 
the  complement  of  that  pattern.  Thus  we  obtained  a  synthetic 
2-band  image,  each  band  of  which  has  added  noise  exactly 
where  the  other  does  not,  for  each  choice  of  standard  devia- 


Fig.  5.  Close-ups  from  the  images  in  Figure  4  showing  the  effect  of  the  entropy-weighted  metric  (bottom  images  are  gamma-corrected  to  show  structure  in  dark 
areas). 


tion.  Figure  9(a)  and  9(b)  show  an  example  of  this  for  noise  of 
standard  deviation  45  grayvalues,  with  the  added  noise  high¬ 
lighted  in  red.  For  each  such  pair  we  computed  the  metric 
constructed  in  section  IV,  and  made  a  pixel- wise  decision  as 
to  which  band  was  noisier,  based  on  the  magnitude  of  each 
metric  component.  The  proportion  of  correctly  classified  pix¬ 
els  as  a  function  of  the  standard  deviation  of  the  added  noise 
is  plotted  in  figure  8  (left).  Note  that  even  at  a  standard  devi¬ 
ation  of  2.5  grayvalues  the  percentage  of  correctly  classified 
pixels  is  approximately  89%.  The  majority  of  misclassified 
pixels  lie  on  the  hair,  where  the  correct  image  structure  is  al¬ 
most  impossible  to  detect  locally  without  semantic  clues,  and 
along  the  grid  boundaries. 


VI.  Conclusion 

We  introduced  a  locally  adaptive  metric  associated  to  a  mul¬ 
timodal  image,  which  models  the  relative  likelihood  that  fea¬ 
tures  from  a  given  band  are  due  to  underlying  image  features, 
versus  noise.  The  underlying  assumption  is  that  neighbor¬ 
hoods  with  simpler  geometric  structure  are  more  likely  to  arise 
from  true  features,  while  geometrically  disorganized  ones  are 
likely  to  be  due  to  noise.  This  metric  can  then  be  instantiated 
into  the  image  fusion  formalism  previously  proposed  by  the 
author,  to  yield  a  contrast  fusion  model  which  is  more  robust 
to  the  presence  of  image  noise.  We  presented  experimental 
results  that  clearly  show  the  effect  of  the  adaptive  metric  on 
real-life  imagery,  with  synthetic  noise.  We  can  numerically 
compute  the  effect  of  the  new  metric  on  the  resulting  fusion, 


Fig.  7.  From  top  left  to  bottom  right:  original  image,  result  of  Euclidean 
fusion,  result  of  fusion  with  local-entropy-based  metric,  result  of  Haar 
wavelet  fusion. 


Fig.  8.  Left:  Proportion  of  correctly  classified  pixels  as  a  function  of  the 
standard  deviation  of  additive  noise.  Right:  Ratio  of  metric  components 
a  a  function  of  standard  deviation  of  additive  noise 

which  is  seen  to  be  very  positive. 

Two  topics  for  future  research  should  be  clear  from  the  con¬ 
struction  above.  Firstly,  the  metric  we  construct  is  pointwise 
diagonal,  which  means  that  contrasts  from  different  bands  are 
not  allowed  to  interact  with  each  other  in  a  nonlinear  fashion. 
The  reason  for  placing  such  a  restriction  on  the  metric  was 
simply  that  in  the  diagonal  case  the  use  of  local  gradient  en¬ 
tropies  becomes  intuitive.  However,  there  is  no  reason  to  be¬ 
lieve  that  the  ideal  metric,  if  such  an  object  exists,  should  be 
diagonal,  and  thus  the  construction  of  non-diagonal  metrics 
that  extend  the  results  above  should  be  explored.  Secondly, 
as  we  noted  in  section  IV,  our  entropy  estimation  relied  on 
neighborhoods  of  fixed  size  throughout  the  image  plane.  It 


(a)  (b)  (c) 


Fig.  9.  (a)  and  (b):  Two  bands  of  an  artificial  image  with  added  Gaussian 
noise  of  standard  deviation  45  grayvalues  highlighted  in  red.  (c)  Misclas- 
sified  pixels  for  image  pair  with  noise  of  standard  deviation  5  grayvalues. 


makes  sense  to  think  that  if  we  knew  a  priori  that  a  given 
pixel  is  surrounded  by  a  geometrically  structured  region,  then 
a  smaller  neighborhood  would  suffice  for  estimating  the  en¬ 
tropy,  while  larger  neighborhoods  would  be  required  for  pix¬ 
els  in  noisy  regions.  This  would  allow  the  metric  to  more 
closely  follow  the  local  characteristics  of  the  multiband  im¬ 
age.  However,  when  we  have  no  a  priori  knowledge  of  the 
image  noise,  it  is  not  clear  how  to  vary  the  local  neighborhood 
size. 

Lastly,  we  should  note  that  by  its  very  nature  this  method 
is  not  well-adapted  for  processing  imagery  with  structured  or 
periodic  noise  at  the  estimation  scale.  In  that  case,  the  en¬ 
tropy  based  metric  will  detect  the  structure  of  the  noise  and 
mistakenly  increase  the  weight  assigned  to  bands  containing 
such  noise.  In  those  situations,  pre-processing  each  band  sep¬ 
arately  with  a  filter  in  the  frequency  domain  may  yield  better 
results. 
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